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ABSTRACT 



The Worker Profiling and Reemployment Services (WPRS) 
program requires states to establish systems for identifying Unemployment 
Insurance (UI) claimants likely to exhaust their UI benefits and refers them 
to reemployment services. An evaluation was conducted to assess the 
reliability of the impact estimates provided in the evaluation of the WPRS 
program, and to compute revised estimates of the impacts of WPRS programs if 
a more accurate estimation method could be identified. Data for the 
evaluation were gathered from the Job Search Assistance Demonstration in 
Florida, which, beginning in 1995, randomly assigned UI claimants to control 
groups or treatment groups who received training in job search techniques. 

The data were to be tested in two phases: In Phase I, using the regression 
method; and in Phase II, using variants of the matching methods used in other 
evaluations. The Phase I evaluation found that the linear regression model 
used in the WPRS evaluation produced accurate impact estimates, while the 
matched comparison groups tested in this evaluation produced less accurate 
impact estimates than the linear regression model. Based c)n the results of 
Phase I, therefore, it was decided not to proceed to Phase II of the 
evaluation. (Contains 10 references.) (KC) 
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EXECUTIVE SUMMARY 



s 

This evaluation is motivated by two goals: (1) to assess the reliability of the impact 
estimates provided in the evaluation of the Worker Profiling and Reemployment Services 
(WPRS) programs, and (2) to compute revised estimates of the impacts of WPRS programs if a 
more accurate estimation method can be identified. The evaluation also provides general 
information on the accuracy of different methods for estimating impacts without random 
assignment. 

Under WPRS, states were required to establish systems for identifying Unemployment 
Insurance (UI) claimants likely to exhaust their UI benefits and referring them to reemployment 
services, such as resume preparation and training in job search methods. In an evaluation 
sponsored by the U.S. Department of Labor (USDOL), the impacts of WPRS were estimated by 
comparing UI claimants who were assigned to WPRS services (the treatment group) to claimants 
who were not assigned to WPRS services (the comparison group). Linear regression techniques 
were used to control for pre-existing differences between the two groups. 

The results from the WPRS evaluation suggest that the impacts of WPRS on earnings are 
positive in some states and negative in others. However, the wide variation in impact estimates 
across states raises questions about the accuracy of the estimates. Furthermore, when the pre- 
existing differences between the treatment and comparison groups are large~as in the WPRS 
evaluation— linear regression methods can be unreliable. Therefore, the wide state-to-state 
variation in the estimated earnings impacts may be due to estimation error attributable to the 
regression method used in the WPRS evaluation. 

Prior to the implementation of WPRS, USDOL sponsored a demonstration to test different 
program models that are consistent with the regulations governing WPRS. In 1995, the Job 
Search Assistance (JSA) Demonstration was implemented in the District of Columbia and in 
selected counties in Florida. Because the demonstration was based on the random assignment of 
eligible claimants to treatment and control groups, impacts were estimated by comparing 
treatment group members to control group members. Random assignment ensured that the pre- 
existing differences between the two groups were negligible. 

Therefore, the demonstration should provide reliable estimates of the impacts of different 
WPRS program models via treatment-control differences. Furthermore, demonstration data can 
be used to compute other impact estimates using data that mimic the treatment and comparison 
samples available to the WPRS evaluation. The reliability of these impact estimates can be 
tested by comparing them to the treatment-control differences. 

In this evaluation, we use data from the JSA Demonstration in Florida to mimic the 
treatment and comparison samples from the WPRS evaluation, and to test different methods of 
estimating impacts from these samples. These methods include the regression method used in 
the WPRS evaluation, but also include variants of the matching methods used in other 
evaluations. Matching is designed to select a subgroup of comparison group members who are 
similar to treatment group members. Impacts are then estimated by comparing treatment group 
members to the subgroup of similar comparison group members. 
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The plan for the evaluation included two phases: 

• Phase I: Testing Different Methods of Estimating Impacts Using JSA Data. In 
Phase I, use data from the JSA Demonstration to assess the reliability of the 
regression method employed in the WPRS evaluation and the matching methods 
developed in this evaluation. 

• Phase II: Applying Matching Methods to Actual WPRS Data. If any of the 
matching methods produce more accurate impact estimates than the regression 
method, apply those matching methods to WPRS data to obtain revised estimates of 
the impacts of WPRS on earnings. 



DESIGN OF PHASE I OF THE EVALUATION 

The design of Phase I consisted of two components: (1) identifying the analysis samples 
from JSA Demonstration data; and (2) specifying methods for estimating the impacts of being 
assigned to JSA/WPRS services on the claimants who would have been assigned to services if 
WPRS had been operating in Florida in place of the demonstration. 

Identifying Three Samples from JSA Demonstration Data. We used the rule by which UI 
claimants are assigned to WPRS to determine which claimants would have been assigned to 
WPRS had it been operating instead of the demonstration. Claimants who would have been 
assigned to WPRS were classified as “treatment claimants” or “control claimants” for this 
evaluation based on their treatment-control status in the demonstration. Claimants who would 
not have been assigned to WPRS (and were not treated in the demonstration) were classified as 
“comparison claimants”. 

Specifying the Methods for Estimating Impacts. Based on the three analysis samples, we 
specified alternative methods of estimating the impacts of being assigned to WPRS. The 
experimental benchmark estimate equals the mean earnings of treatment claimants minus the 
mean earnings of control claimants. This benchmark is used to assess whether accurate impact 
estimates can be computed from “nonexperimental data”— data on treatment and comparison 
claimants— using either the linear regression method from the WPRS evaluation or one of the 
matched comparison groups developed for this evaluation. 

The matching methods developed for this evaluation are designed to select “matched 
comparison groups” that look like the treatment group. A comparison claimant is selected for 
the matched comparison group if he or she can be “matched” to one or more treatment claimants 
with similar characteristics. The rules developed for defining acceptable matches require that 
matched claimants have the same sex, race/ethnicity, and education. Furthermore, matching 
claimants must have similar values for one of the following three variables: 

1. Profiling Score. UI claimants are assigned “profiling scores” that reflect the 
probability of exhausting UI benefits without additional reemployment services, 
and are assigned to WPRS based on these scores. 
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2. Base-year Earnings. Claimants are determined eligible for UI based on their 
“base-year earnings”, which measures total earnings in four out of five quarters 
prior to the UI claim. 

3. Propensity Score. Treatment claimants have higher probabilities or propensities of 
being assigned to services than comparison claimants, and “propensity scores” are 
often computed in evaluations to use as matching variables. 



FINDINGS FROM PHASE I OF THE EVALUATION 

Based on the treatment and control groups in this evaluation, the experimental benchmark 
estimate that we use to assess the accuracy of other impact estimates equals $260. Therefore, the 
average earnings of treatment claimants in the year following the quarter of random assignment 
were $260 higher than the average earnings of control claimants in the same year. 

How well did the different methods for estimating earnings impacts from the treatment and 
comparison samples perform? The two main findings from Phase I of the evaluation are given 
below: 

1. The linear regression model used in the WPRS evaluation produced accurate 
impact estimates. The estimate produced by the linear regression model from the 
WPRS evaluation equals $308, which is very close to the experimental benchmark 
of $260. 

2. The matched comparison groups tested in this evaluation produced less accurate 
impact estimates than the linear regression model. The impact estimates based on 
matched comparison groups range from -$111 to -$3,440, and none of these 
estimates are as close to the experimental benchmark as the estimate produced by 
the linear regression model. 



Therefore, despite the general concerns that can be raised about the reliability of regression 
methods to adjust for large differences between treatment and comparison groups, this 
evaluation provides no evidence that the regression methods used in the WPRS evaluation are 
unreliable. 

The poor performance of the matching methods tested in this evaluation can be attributed to 
the difficulty in selecting matched comparison groups that are sufficiently similar to the 
treatment group. Each matched comparison group was similar to the treatment group on many 
dimensions but different from the treatment group in at least one dimension that proved to be 
important. None of the matched comparison groups had the same (or a very similar) distribution 
of claimants across the local offices in the demonstration as the treatment group. Findings in this 
report suggest that it may be impossible to create a matched comparison group that is 
comparable to the treatment group in the distribution of claimants across local offices, and is also 
comparable to the treatment group in other important dimensions, such as sex, race/ethnicity, 
education, the profiling score, base-year earnings, and the propensity score. In other words, we 
were unable to create a matched comparison group that was comparable to the treatment group 
on all the dimensions that seemed important. 
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APPENDIX A: WEIGHTING THE FOUR MATCHED COMPARISON GROUPS 



The four matched comparison groups we selected were used to compute four different 
estimates of the impacts of the treatment on claimants who would have been assigned to WPRS. 
These impact estimates are computed by subtracting the average earnings of matched 
comparison group members from the average earnings of treatment group members. However, 
these averages are not simple, unweighted averages. As shown previously in Table 4, we assign 
weights to treatment claimants that reflect the sampling probabilities in the local offices where 
they applied for benefits. As described in this section, we assign weights to matched comparison 
group members that reflect the sampling probabilities of the treatment group members to which 
they were matched. The weights for matched comparison claimants were designed to ensure that 
the sum equals the sum of the weights for treatment claimants. 

Weighting the treatment group was straightforward: 

• Treatment claimants matching at least one comparison claimant are assigned weights 
based on Table 4. 

• Unmatched treatment claimants are assigned weights of zero, which effectively 
dropped them from the analysis. 15 

Dropping unmatched treatment claimants from the analysis is undesirable because if too 
many treatment claimants are dropped, it becomes difficult to generalize the estimated impacts 
for matched treatment claimants to all treatment claimants. However, including unmatched 
treatment claimants in the analysis would guarantee that the treatment and matched comparison 
groups are systematically different due to the unmatched treatment claimants, and would 
therefore defeat the point of selecting matched comparison groups. 

Each matched comparison claimant is weighted to reflect the number of claimants 
represented by the treatment claimants to whom he or she is matched. Suppose, for example, a 
comparison claimant is matched to two treatment claimants--one from Clearwater and the other 
from Davie. Table 4 indicates that the weights for these treatment claimants are 3.76 and 7.44, 
respectively. Under matching rules 1-3, the weight assigned to the matched comparison claimant 
would be 11.20, the sum of the two treatment weights, to account for the 3.76 claimants 
represented by each treatment claimant in Clearwater and the 7.44 claimants represented by each 
treatment claimant in Davie. 

Weighting the matched comparison group generated by matching rule 4 is more complicated 
because this rule allows each treatment claimant to match multiple comparison claimants. The 
multiple comparison claimants matching a single treatment claimant are weighted so that 
together they reflect the claimants represented by the treatment claimant. Consider the example 
from the previous paragraph, but suppose that a second comparison claimant matched the 



15 Unmatchable treatment claimants are not dropped from the analysis files used to compute the regression- 
based estimates. However, regression-based estimates can be heavily influenced by outliers, such as peculiar 
treatment units that cannot be matched to any comparison units. Furthermore, regression-based estimates will often 
provide poor impact estimates for treatment units without any similar comparison units. 
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treatment claimant from Davie. We would weight each of the two matching comparison 
claimants to represent half of the claimants represented by the treatment claimant from Davie, or 
3.72 claimants (half the treatment group weight of 7.44). However, the first of two comparison 
claimants was also matched to the treatment claimant from Clearwater, so this comparison 
claimant would receive a weight of 7.48 (3.72 + 3.76). 

The weights for matched comparison claimants depend on the number of treatment 
claimants to which they matched. Furthermore, different random samples based on the sampling 
plan described in Chapter II would produce different treatment and comparison samples, 
different matched pairs of treatment and comparison claimants, and different weights for 
matched comparison claimants. Therefore, the weights for matched comparison claimants 
contain sampling variation. When weights contain sampling variation, the least squares estimate 
of the standard error of the impact estimate will be biased. Therefore, it is necessary to account 
for sampling variation in the weights when estimating the standard errors of the impact estimates 
generated by our four matched comparison groups. Our solution to this problem is to estimate 
these four standard errors via bootstrapping. For comparability between the standard errors of all 
six impact estimates presented in this report, we estimate the standard errors of all six impact 
estimates via bootstrapping. 
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Based on the results from Phase I, MPR recommended not to proceed to Phase II of this 
evaluation, and USDOL concurred. This recommendation was based on the finding that the 
regression methods used in the WPRS evaluation produced accurate estimates of earnings 
impacts from the demonstration data. This report provides no evidence that matched comparison 
groups of the types we tested would yield more accurate estimates of the impacts of WPRS. 



CHAPTER I: INTRODUCTION 



This evaluation is motivated by two specific goals and one more general goal. The specific 
goals are to ( 1 ) assess the reliability of the impact estimates provided in the evaluation of the 
Worker Profiling and Reemployment Services (WPRS) programs, and (2) compute revised 
estimates of the impacts of WPRS if a more accurate estimation method can be identified. The 
more general goal of this evaluation is to provide information about which estimation methods 
are most accurate when computing impact estimates from nonexperimental data—data without a 
randomly assigned control group. 

WPRS was created in response to a 1993 amendment to the Social Security Act. This 
amendment required states to establish profiling systems for targeting Unemployment Insurance 
(UI) claimants likely to remain unemployed long enough to exhaust their UI benefits, and for 
referring targeted claimants to reemployment services shortly after they apply for benefits 
(Dickinson et al., 1999). The program model implemented under WPRS varies across states. In 
some states, all claimants assigned to the program are required to participate in the same set of 
services. In other states, counselors have more discretion in specifying the services in which 
each claimant must participate to remain eligible for UI benefits. These services include training 
in job search methods, resume preparation, job development, and referrals to job openings. 

Prior to the implementation of WPRS, the U.S. Department of Labor (USDOL) sponsored a 
demonstration to test different program models that are consistent with the regulations governing 
WPRS. The Job Search Assistance (JSA) Demonstration was implemented in the District of 
Columbia and selected counties in Florida in 1995, and it continued to operate in 1996 when the 
implementation of WPRS began. The evaluation of the demonstration (Decker et al., 2000) was 
based on the random assignment of eligible claimants to three treatment groups and one control 
group. The first treatment was based on a list of services in which all claimants were required to 
participate. The second two treatments allowed counselors to determine the required services for 
each claimant on an individualized basis. The key outcome variables in this evaluation were (1) 
UI benefits and duration, and (2) employment and earnings. Because the treatment impacts were 
measured relative to a randomly selected control group, the resulting impact estimates are more 
credible than those measured relative to a nonrandom comparison group whose members may 
differ systematically from treatment group members. 

During the evaluation of the JSA Demonstration, USDOL also sponsored an evaluation of 
the WPRS program itself in six states (Dickinson et al., 1999). The WPRS evaluation can be 
justified by two problems with generalizing the findings from the demonstration to WPRS. First, 
the rule by which UI claimants were assigned to demonstration services was different from the 
rule by which UI claimants are assigned to WPRS services in most states. Therefore, WPRS 
targets a different set of claimants than those who would have been eligible for the 
demonstration. Second, WPRS service models differ across counties and states, so the results 
from the District of Columbia and ten counties in Florida may be unrepresentative of WPRS 
nationwide. 

Unlike the JSA Demonstration, the WPRS evaluation lacked the benefit of a randomly 
assigned control group: impacts were estimated by comparing UI claimants assigned to WPRS 
services (the “treatment” group) to a comparison group consisting of claimants who were not 
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assigned to WPRS services. Due to the rule by which claimants are assigned to services, 
comparison group members in the evaluation had systematically different baseline characteristics 
than treatment group members. The WPRS evaluation accounted for baseline differences 
between the two groups using regression analysis. However, when the differences between two 
groups are large, linear regression can produce biased impact estimates (Cochran, 1965). 
Furthermore, the estimated earnings impacts from the WPRS report varied considerably across 
states (Dickinson et al., 1999, Exhibit III-9.). This variation across states may be “real”: some 
state programs may be much more effective than others. However, given the difficulties in 
estimating impacts without a randomly assigned control group, the wide variation in impact 
estimates across states raises questions about the accuracy of the estimates. 

One alternative method of adjusting for baseline differences between treatment and 
comparison groups is “statistical matching”. Each treatment group member is matched to one or 
more comparison group members with similar baseline characteristics. Comparison group 
members who are matched to one or more treatment group members are included in the 
“matched comparison group”: other comparison group members are excluded. Matching is 
designed to select a subgroup of the comparison group that has similar baseline characteristics to 
the treatment group. The goal of matching is to select matched comparison group members 
whose outcomes are as similar as possible to what the outcomes of treatment group members 
would have been in the absence of the treatment. 

One particular form of matching that has become increasingly popular is called propensity 
score matching (Rosenbaum and Rubin 1983, 1985). Among people with the same probability 
of participating in (or being assigned to) a program, whether or not a person actually participates 
is a purely random event like assignment to the treatment or control groups in a random- 
assignment experiment. Propensity score matching selects matched groups of treatment and 
comparison units with similar participation probabilities or “propensities”— or more typically, 
similar estimated propensities. Through propensity score matching, baseline differences between 
the treatment and comparison groups can be reduced for many baseline variables while using a 
single variable for matching. Dehejia and Wahba (1999) use experimental data from the 
National Supported Work Demonstration to show that propensity score matching can produce 
impact estimates that are very close to the experimental estimates. These results suggest that 
propensity score matching can generate accurate estimates of the impacts of some employment- 
related programs. 

The results from Dehejia and Wahba raise the following question: can propensity score 
matching generate accurate estimates of the impacts of HP RSI Phase I of this evaluation was 
designed to test the reliability of three different matching methods, including propensity score 
matching, and the reliability of the regression methods used in the WPRS evaluation. Phase II of 
the evaluation was designed to apply matching methods to data from the WPRS evaluation to 
compute revised estimates of WPRS’s impacts on earnings. However, Phase II would only 
proceed if Phase I showed that the matching methods produced more accurate impact estimates 
than the regression methods. 



Therefore, this evaluation can be summarized as follows: 



• Phase I: Testing Different Methods of Estimating Impacts with JSA Data. In Phase 
I, we used data from the JSA Demonstration to assess the reliability of the standard 
regression methods employed in the WPRS evaluation and the matching methods 
developed in this evaluation. 

• Phase II: Applying Matching Methods to Actual WPRS Data. If any of the 
matching methods had produced more accurate impact estimates than the regression 
methods, we would have applied them to obtain revised estimates of the impacts of . 
WPRS. 



Based on the results from Phase I, MPR recommended not to proceed to Phase II of this 
evaluation, and USDOL concurred. This recommendation was based on the finding that the 
regression methods used in the WPRS evaluation produced accurate estimates of earnings 
impacts from the demonstration data. This report provides no evidence that matched comparison 
groups of the types we tested would yield more accurate estimates of the impacts of WPRS. 

The remaining chapters of this report describe the evaluation’s design and the results from 
Phase I. Chapter II describes the design of the analysis samples used in this evaluation. Chapter 
III describes the different methods we tested for estimating impacts. Chapter IV provides the 
impact estimates generated by applying different estimation methods to the analysis samples. 
These estimates support MPR’s recommendation and USDOL’s decision to end the evaluation 
after Phase I. 
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CHAPTER II: DESIGN OF THE ANALYSIS SAMPLES 



The design of this evaluation required four steps: selecting the data, identifying analysis 
samples within the data, weighting the samples to ensure that the analysis samples are 
representative of the populations of interest, and specifying methods for estimating program 
impacts. As described in the introduction, we selected data from the JSA Demonstration because 
the demonstration’s treatments roughly correspond to the types of service packages to which 
claimants are assigned under WPRS, and because the demonstration included a randomly 
assigned control group. This chapter describes the second and third design steps— identifying and 
weighting the analysis samples— and leaves the estimation methods to Chapter III. 



A. IDENTIFYING THE ANALYSIS SAMPLES 

The data used in this evaluation were collected to support the evaluation of the Job Search 
Assistance Demonstration in 10 local offices in Florida (Decker et al., 2000). The evaluation of 
the demonstration was designed to provide estimates of the impacts of three different job search 
assistance treatments on claimants who were eligible for the demonstration. From the claimants 
who applied for UI benefits in the 10 demonstration offices during the demonstration, the state of 
Florida identified claimants who were newly unemployed, who did not have a specific date of 
recall to their previous employer, who did not obtain jobs through a union hiring hall, and who 
met a few other eligibility criteria. 1 Final eligibility for the demonstration was determined based 
on profiling scores that were assigned to all claimants meeting the state’s eligibility criteria. This 
score provided an estimate of the probability that the claimant would exhaust his or her 
entitlement of UI benefits in the absence of additional reemployment services. 2 Profiling scores 
were assigned based on a linear model of benefit exhaustion, which was estimated from 
historical UI data. The model of benefit exhaustion included the following variables as 
predictors: education, industry, occupation, and job tenure. 3 Claimants were deemed eligible for 
the demonstration if they met the state’s eligibility criteria, and if they were assigned profiling 
scores greater than 0.4. 

Under WPRS in Florida, claimants are assigned profiling scores using the model of benefit 
exhaustion developed for the demonstration. However, WPRS differs from the demonstration in 
how it uses these scores to target reemployment services. Under WPRS, the claimants with the 
highest profiling scores are assigned to locally provided services subject to local capacity 
constraints. Therefore, the average profiling score should be higher for claimants who would 
have been assigned to WPRS than for demonstration participants. If claimants with higher 
profiling scores have a greater need for reemployment services than other claimants, the average 
impacts of reemployment services might be higher for “WPRS-targeted” claimants than for 
demonstration participants. 



1 For a complete list of the state’s eligibility criteria, see Decker et al., 1997, p. 40-41. 

2 Individuals who qualify for UI are entitled to a fixed amount of UI benefits, and most of those who remain on 
UI for 26 weeks exhaust their entitlement. 

3 The coefficient estimates from the profiling score model are provided in Table III. 1 of Decker et al., 1997. 
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